fix: forward warmup_ratio correctly to TrainingArguments#21
Conversation
warmup_ratio (float 0.03) was passed to warmup_steps (int), truncating to 0 and silently disabling warmup entirely. Add a warmup_steps config field (default 0) and forward both fields to TrainingArguments so each reaches the correct parameter.
KonstiNik
left a comment
There was a problem hiding this comment.
Thanks for picking this up!
Tho, I'd argue we should follow HF's API more directly here. Maintaining warmup_ratio and warmup_steps as two separate fields for the same underlying concept doesn't really hold up — HF themselves consolidated this in 5.2 into a single warmup_steps: float where < 1 is interpreted as a ratio and >= 1 as absolute steps (training_args.py:788-793, training_args.py:2047-2054). One knob covers both use cases cleanly.
Concretely I suggest:
- Drop
warmup_ratiofrom TrainingConfig and rename to warmup_steps. The piece I think matters most: users need to be told what to put in this field, since accepting both ratios and absolute step counts is non-obvious from the name alone. Use thefield(metadata=...)form (same pattern HF uses inTrainingArguments), and we can surface it in--helpor auto-generated docs later:
warmup_steps: float = field(
default=0.03,
metadata={
"help": (
"Linear warmup duration. Values in [0, 1) are interpreted as a "
"ratio of total training steps; values >= 1 are interpreted as an "
"absolute number of steps; 0 disables warmup."
)
},
)
- Forward only
warmup_stepstoTrainingArguments.
Add a small shim inPostTrainingConfig.load(config.py:270, betweenOmegaConf.loadand the schema merge) so old YAMLs don't break:
if training is not None and "warmup_ratio" in training:
logger.warning(
"training.warmup_ratio is deprecated; use training.warmup_steps "
"(values < 1 are interpreted as a ratio). Auto-migrating."
)
training.warmup_steps = training.pop("warmup_ratio")
(Needs import logging + a module-level logger — neither exists in config.py today.)
…at field Drop warmup_ratio; warmup_steps is now a float where values in [0, 1) are treated as a ratio and values >= 1 as absolute steps, matching HF's TrainingArguments behaviour. Adds a backward-compat shim that auto-migrates warmup_ratio in old YAMLs with a deprecation warning.
|
Done. |
KonstiNik
left a comment
There was a problem hiding this comment.
Refactor looks great. One small question: the default went from 0.03 to 0.0, so configs that didn't set warmup explicitly will now get no warmup instead of 3%. Was the default change intentional, or worth flipping back to 0.03 to keep the old behavior?
It was intentional. TRL's default is 0.0 so I thought it best to mimic it. |
Summary
Drop
warmup_ratiofromTrainingConfigand makewarmup_stepsafloat. The original bug:warmup_ratio(e.g.0.03) was being forwarded towarmup_steps(anintfield inTrainingArguments), silently truncating to0and disabling warmup entirely. HuggingFaceTrainingArgumentsalready interprets a float value forwarmup_stepsas a ratio, so a singlefloatfield covers both use cases with no special handling needed.Type of change